This report is designed to provide summary statistics for
FoodMicrobionet, version 5.0 or higher. It takes as an input the
FMBN_plus list and returns text, tables and graphs.
These results are for FoodMicrobionet version .
The number of studies in FMBN is 251. This version includes studies on fungal microbiota only (11), on bacterial microbiota only (230) and studies for which both data for bacterial and fungal microbiota are available. However, due to inconsistencies in the deposit of sequences in SRA (in several cases the same sample was deposited with two separate biosample accessions and/or data for bacteria and fungi were deposited with different bioproject of study accessions), the same samples might be present in two studies, one for bacteria and one for fungi1. We did our best to match samples in these situations. The addition of datasets on fungi is in progress and, when available, we will progressively add fungal data for all the studies which are already in FoodMicrobionet with bacterial community data and add more fungal studies.
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_align()`).
The largest growth in studies and samples has been between version 3.1 (published in 2019) and version 3.2 (unpublished). Note that some older studies were annotated as belonging to version 5.0 when fungal data were added.
FMBN grows by addition of sequences deposited in NCBI SRA for published studies. As a consequence, use of targets (16S RNA, 16S RNA gene, ITS) reflect what is published and the correlation between platforms and targets.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `year = as.numeric(year)`.
## Caused by warning:
## ! NAs introduced by coercion
## Joining with `by = join_by(year, platform_2)`
Recently, the number of studies using Illumina newer platforms (other
than MiSeq) is growing quickly.
The regios used as targets also reflects the use of different
platforms and the preference for a given target (V4 and V3-V4), at least
for bacteria. Data for fungi are still not sufficient to define a
trend.
The following table reports the number of studies available for each
platform and the croos-tabulation of studies by gene target and
platform.
## platform_2
## region 454 GS Illumina Ion Torrent Sum
## ITS1 0 6 0 6
## ITS1 and V3-V4 0 8 0 8
## ITS1 and V4 0 1 0 1
## ITS1+ITS2 1 0 0 1
## ITS2 0 4 0 4
## ITS2 and V3-V4 0 1 0 1
## V1-V2 1 2 2 5
## V1-V3 33 13 0 46
## V2-V3 1 0 0 1
## V3 1 3 0 4
## V3-V4 1 122 0 123
## V4 2 33 3 38
## V4-V5 2 5 0 7
## V4-V6 0 1 0 1
## V5 1 0 0 1
## V5-V6 0 1 1 2
## V5-V9 1 0 0 1
## V6-V8 0 1 0 1
## Sum 44 201 6 251
A large number of studies targets V3-V4 or V4 for bacteria and ITS1 for fungi.
## Joining with `by = join_by(year, region)`
The distribution of studies by platform and region reflect current practices in metataxonomic analysis of food microbial communities. With phasing out of 454 GS (most studies targeted V1-V3), the majority of studies for bacteria is now Illumina with either V3-V4 (which alone make 55.23% of studies) or V1-V3 (18.83%).
Data for fungi (21 studies only) are summarized below:
## Joining with `by = join_by(year, region)`
However, data for fungi in FoodMicrobionet are still too scarse to draw
any conclusion of target preferences.
The majority of studies is on dairy products.
The number of samples in FMBN has been rising almost exponentially with time and it is now 1.4976^{4}. There are 14035 samples with data on bacteria, 1114 samples with data on fungi. For 366 samples we were able to match samples for bacteria and fungi.
FMBN is the richest database in terms of number of samples for foods and food environments, and it is also the best annotated one.
A few statistics on samples are shown below. The following table shows the proportion and cumulative proportion of unique food samples and food environment samples, classified by the L1 level of the FoodEx2 classification. Mock communities and extraction blanks are included.
| L1 | n | prop | cumprop |
|---|---|---|---|
| Milk and dairy products | 5807 | 0.3931 | 0.3931 |
| Meat and meat products | 3605 | 0.2440 | 0.6371 |
| Vegetables and vegetable products | 1508 | 0.1021 | 0.7391 |
| Fruit and fruit products | 1178 | 0.0797 | 0.8189 |
| Fish, seafood, amphibians, reptiles and invertebrates | 814 | 0.0551 | 0.8740 |
| Alcoholic beverages | 513 | 0.0347 | 0.9087 |
| Major isolated ingredients, additives, flavours, baking and processing aids | 339 | 0.0229 | 0.9316 |
| Seasoning, sauces and condiments | 202 | 0.0137 | 0.9453 |
| Grains and grain-based products | 155 | 0.0105 | 0.9558 |
| Composite dishes | 147 | 0.0099 | 0.9658 |
| Fruit and vegetable juices and nectars (including concentrates) | 82 | 0.0056 | 0.9713 |
| Legumes, nuts, oilseeds and spices | 81 | 0.0055 | 0.9768 |
| Generic food environments | 64 | 0.0043 | 0.9811 |
| Food products for young population | 61 | 0.0041 | 0.9852 |
| Eggs and egg products | 56 | 0.0038 | 0.9890 |
| Animal and vegetable fats and oils and primary derivatives thereof | 53 | 0.0036 | 0.9926 |
| Sugar and similar, confectionery and water-based sweet desserts | 42 | 0.0028 | 0.9955 |
| Extraction blank | 25 | 0.0017 | 0.9972 |
| Starchy roots or tubers and products thereof, sugar plants | 17 | 0.0012 | 0.9983 |
| Coffee, cocoa, tea and infusions | 15 | 0.0010 | 0.9993 |
| Mock community | 10 | 0.0007 | 1.0000 |
Samples in FMBN belong to 21 major food groups (L1 level of FoodEx2
exposure classification).
There are 2634 environmental samples and 12125 food samples. Samples in
FMBN are further classified using levels L4 and L6 of the FoodEx2
exposure classification, and additional fields (which allow to identify
raw products, intermediates or finished products, the level of thermal
treatment and the occurrence of spoilage and/or fermentation) allow a
finer classification. Samples in FMBN belong to 134 L4 food groups and
239 L6 food groups. There are 199 foodIds (food types), and, combining
further information on samples (nature, heat treatment,
spoilage/fermentation), there are 388 combinations.
The structure of FoodMicrobionet allows the user to fine-tune each search and extract just the combination of samples s/he desires. Below, I am showing a few stats on number of sequences, by region. However, the user can perform searches based on the type of target, the region, he length of sequences per sample and even the occurrence of issues during the bioinformatic analysis (low number of sequences, high proportion of losses in a specific phase of the pipeline).
From version 4.1.2 geographic location of samples (when provided in
metadata) was added to the samples table.
We plan to fill up this information on existing samples and will
continue adding it to new samples. However, interested users should
always double check on the original paper for the meaning of the
coordinates (are they the place of sampling? the origin of the food? For
example, there is a Japanese study studying imported French cheeses:
which should be the location?).
| geo_loc_continent | n | prop | cumprop |
|---|---|---|---|
| Europe | 8137 | 0.5508 | 0.5508 |
| North America | 2515 | 0.1702 | 0.7210 |
| Asia | 2212 | 0.1497 | 0.8707 |
| Oceania | 901 | 0.0610 | 0.9317 |
| Africa | 520 | 0.0352 | 0.9669 |
| South America | 316 | 0.0214 | 0.9883 |
| Belgium | 104 | 0.0070 | 0.9953 |
| NA | 69 | 0.0047 | 1.0000 |
| geo_loc_country | n | prop | cumprop |
|---|---|---|---|
| Italy | 3296 | 0.2231 | 0.2231 |
| United States of America | 1795 | 0.1215 | 0.3446 |
| France | 1423 | 0.0963 | 0.4409 |
| Norway | 941 | 0.0637 | 0.5046 |
| Australia | 893 | 0.0604 | 0.5650 |
| China | 879 | 0.0595 | 0.6245 |
| South Korea | 660 | 0.0447 | 0.6692 |
| Canada | 600 | 0.0406 | 0.7098 |
| Ireland | 599 | 0.0405 | 0.7504 |
| United Kingdom | 371 | 0.0251 | 0.7755 |
| Sweden | 355 | 0.0240 | 0.7995 |
| Belgium | 285 | 0.0193 | 0.8188 |
| Cyprus | 251 | 0.0170 | 0.8358 |
| Brazil | 217 | 0.0147 | 0.8505 |
| Spain | 186 | 0.0126 | 0.8631 |
| Senegal | 120 | 0.0081 | 0.8712 |
| Ivory Coast | 119 | 0.0081 | 0.8792 |
| Europe | 104 | 0.0070 | 0.8863 |
| NA | 103 | 0.0070 | 0.8933 |
| Japan | 101 | 0.0068 | 0.9001 |
| Finland | 96 | 0.0065 | 0.9066 |
| Austria | 91 | 0.0062 | 0.9128 |
| Thailand | 84 | 0.0057 | 0.9184 |
| Greenland | 75 | 0.0051 | 0.9235 |
| Benin | 66 | 0.0045 | 0.9280 |
| Netherlands | 63 | 0.0043 | 0.9322 |
| Malaysia | 61 | 0.0041 | 0.9364 |
| Portugal | 59 | 0.0040 | 0.9404 |
| Cameroon | 58 | 0.0039 | 0.9443 |
| Germany | 58 | 0.0039 | 0.9482 |
| Israel | 56 | 0.0038 | 0.9520 |
| Denmark | 46 | 0.0031 | 0.9551 |
| Mexico | 45 | 0.0030 | 0.9582 |
| South Africa | 42 | 0.0028 | 0.9610 |
| Colombia | 40 | 0.0027 | 0.9637 |
| Laos | 40 | 0.0027 | 0.9664 |
| Switzerland | 36 | 0.0024 | 0.9689 |
| Estonia | 32 | 0.0022 | 0.9710 |
| Pakistan | 31 | 0.0021 | 0.9731 |
| Croatia | 28 | 0.0019 | 0.9750 |
| Brasil | 26 | 0.0018 | 0.9768 |
| Ethiopia | 23 | 0.0016 | 0.9783 |
| Hungary | 23 | 0.0016 | 0.9799 |
| Iceland | 20 | 0.0014 | 0.9813 |
| Russia | 20 | 0.0014 | 0.9826 |
| Argentina | 17 | 0.0012 | 0.9838 |
| Poland | 16 | 0.0011 | 0.9848 |
| Bosnia-Erzegovina | 15 | 0.0010 | 0.9859 |
| Iran | 15 | 0.0010 | 0.9869 |
| Greece | 14 | 0.0009 | 0.9878 |
| Madagascar | 14 | 0.0009 | 0.9888 |
| Kazakhstan | 13 | 0.0009 | 0.9896 |
| Zambia | 13 | 0.0009 | 0.9905 |
| Gabon | 12 | 0.0008 | 0.9913 |
| Serbia | 11 | 0.0007 | 0.9921 |
| Ghana | 10 | 0.0007 | 0.9928 |
| Guyana | 10 | 0.0007 | 0.9934 |
| Maldives | 10 | 0.0007 | 0.9941 |
| Nigeria | 8 | 0.0005 | 0.9947 |
| Papua New Guinea | 8 | 0.0005 | 0.9952 |
| Georgia | 7 | 0.0005 | 0.9957 |
| Chile | 6 | 0.0004 | 0.9961 |
| Guinea | 6 | 0.0004 | 0.9965 |
| Morocco | 6 | 0.0004 | 0.9969 |
| Bulgaria | 5 | 0.0003 | 0.9972 |
| Great Britain | 5 | 0.0003 | 0.9976 |
| Zimbabwe | 5 | 0.0003 | 0.9979 |
| Rwanda | 4 | 0.0003 | 0.9982 |
| Svalbard | 4 | 0.0003 | 0.9984 |
| Uganda | 4 | 0.0003 | 0.9987 |
| Burkina Faso | 3 | 0.0002 | 0.9989 |
| Tanzania | 3 | 0.0002 | 0.9991 |
| Czech Republic | 2 | 0.0001 | 0.9993 |
| Indonesia | 2 | 0.0001 | 0.9994 |
| Kenya | 2 | 0.0001 | 0.9995 |
| Lithuania | 2 | 0.0001 | 0.9997 |
| Namibia | 2 | 0.0001 | 0.9998 |
| Viet Nam | 2 | 0.0001 | 0.9999 |
| Latvia | 1 | 0.0001 | 1.0000 |
## Joining with `by = join_by(sovereignt)`
These statistics can be easily calculated separately for bacteria and fungi.
For all studies belonging to version 1.1 or higher, FoodMicrobionet was created by a dedicated pipeline using SILVA for taxonomic assignment; from version 5.0 UNITE was used as a taxonomic reference for ITS sequences for fungi. A few tweaks on taxonomy are needed for coherence and for compatibility with external databases.
We always try to assign sequences to the lowest possible level (given the length of sequences and their quality). Statistics for taxonomic assignment are shown below.
| idelevel | n | prop | cumprop |
|---|---|---|---|
| species | 5147 | 0.524 | 0.524 |
| genus | 3361 | 0.342 | 0.867 |
| family | 648 | 0.066 | 0.933 |
| order | 364 | 0.037 | 0.970 |
| class | 223 | 0.023 | 0.993 |
| phylum | 69 | 0.007 | 1.000 |
| domain | 2 | 0.000 | 1.000 |
| idelevel | n | prop | cumprop |
|---|---|---|---|
| species | 2076 | 0.735 | 0.735 |
| genus | 612 | 0.217 | 0.952 |
| family | 91 | 0.032 | 0.984 |
| order | 31 | 0.011 | 0.995 |
| class | 11 | 0.004 | 0.999 |
| phylum | 3 | 0.001 | 1.000 |
| domain | 1 | 0.000 | 1.000 |
There are currently 1.2782^{4} taxa in this version of FoodMicrobionet, identified at different identification levels. The proportion of taxa identified at the genus level or below is 0.867 for bacteria and 0.952 for fungi.
| phylum | n | prop | cumprop | cumn |
|---|---|---|---|---|
| Proteobacteria | 3257 | 0.3319 | 0.3319 | 3257 |
| Firmicutes | 2089 | 0.2129 | 0.5447 | 5346 |
| Actinobacteriota | 1366 | 0.1392 | 0.6839 | 6712 |
| Bacteroidota | 1330 | 0.1355 | 0.8194 | 8042 |
| Cyanobacteria | 255 | 0.0260 | 0.8454 | 8297 |
| Verrucomicrobiota | 149 | 0.0152 | 0.8606 | 8446 |
| Desulfobacterota | 136 | 0.0139 | 0.8745 | 8582 |
| Acidobacteriota | 134 | 0.0137 | 0.8881 | 8716 |
| Planctomycetota | 122 | 0.0124 | 0.9006 | 8838 |
| Chloroflexi | 121 | 0.0123 | 0.9129 | 8959 |
| Patescibacteria | 107 | 0.0109 | 0.9238 | 9066 |
| Campylobacterota | 76 | 0.0077 | 0.9315 | 9142 |
| Myxococcota | 73 | 0.0074 | 0.9390 | 9215 |
| Halobacterota | 60 | 0.0061 | 0.9451 | 9275 |
| Spirochaetota | 56 | 0.0057 | 0.9508 | 9331 |
| Deinococcota | 54 | 0.0055 | 0.9563 | 9385 |
| Fusobacteriota | 36 | 0.0037 | 0.9600 | 9421 |
| Synergistota | 34 | 0.0035 | 0.9634 | 9455 |
| Bdellovibrionota | 24 | 0.0024 | 0.9659 | 9479 |
| Crenarchaeota | 21 | 0.0021 | 0.9680 | 9500 |
| Fibrobacterota | 19 | 0.0019 | 0.9699 | 9519 |
| Elusimicrobiota | 17 | 0.0017 | 0.9717 | 9536 |
| Gemmatimonadota | 17 | 0.0017 | 0.9734 | 9553 |
| Armatimonadota | 15 | 0.0015 | 0.9749 | 9568 |
| Cloacimonadota | 14 | 0.0014 | 0.9764 | 9582 |
| Halanaerobiaeota | 14 | 0.0014 | 0.9778 | 9596 |
| Nitrospirota | 13 | 0.0013 | 0.9791 | 9609 |
| Thermoplasmatota | 13 | 0.0013 | 0.9804 | 9622 |
| Tenericutes | 11 | 0.0011 | 0.9816 | 9633 |
| Thermotogota | 11 | 0.0011 | 0.9827 | 9644 |
| Euryarchaeota | 9 | 0.0009 | 0.9836 | 9653 |
| Methylomirabilota | 9 | 0.0009 | 0.9845 | 9662 |
| Deferribacterota | 7 | 0.0007 | 0.9852 | 9669 |
| Dependentiae | 7 | 0.0007 | 0.9859 | 9676 |
| Nitrospinota | 5 | 0.0005 | 0.9864 | 9681 |
| TM7 | 5 | 0.0005 | 0.9870 | 9686 |
| Chlorobi | 4 | 0.0004 | 0.9874 | 9690 |
| GN02 | 4 | 0.0004 | 0.9878 | 9694 |
| Latescibacterota | 4 | 0.0004 | 0.9882 | 9698 |
| Nanoarchaeota | 4 | 0.0004 | 0.9886 | 9702 |
| Aquificota | 3 | 0.0003 | 0.9889 | 9705 |
| Calditrichota | 3 | 0.0003 | 0.9892 | 9708 |
| Chlamydiae | 3 | 0.0003 | 0.9895 | 9711 |
| Dictyoglomota | 3 | 0.0003 | 0.9898 | 9714 |
| OP3 | 3 | 0.0003 | 0.9901 | 9717 |
| AD3 | 2 | 0.0002 | 0.9903 | 9719 |
| Abditibacteriota | 2 | 0.0002 | 0.9905 | 9721 |
| BRC1 | 2 | 0.0002 | 0.9907 | 9723 |
| Caldatribacteriota | 2 | 0.0002 | 0.9909 | 9725 |
| Caldisericota | 2 | 0.0002 | 0.9911 | 9727 |
| Coprothermobacterota | 2 | 0.0002 | 0.9913 | 9729 |
| Entotheonellaeota | 2 | 0.0002 | 0.9915 | 9731 |
| Hydrogenedentes | 2 | 0.0002 | 0.9917 | 9733 |
| Lentisphaerae | 2 | 0.0002 | 0.9920 | 9735 |
| Modulibacteria | 2 | 0.0002 | 0.9922 | 9737 |
| Nanoarchaeaeota | 2 | 0.0002 | 0.9924 | 9739 |
| SBR1093 | 2 | 0.0002 | 0.9926 | 9741 |
| Spirochaetes | 2 | 0.0002 | 0.9928 | 9743 |
| Sumerlaeota | 2 | 0.0002 | 0.9930 | 9745 |
| Thermodesulfobiota | 2 | 0.0002 | 0.9932 | 9747 |
| WS3 | 2 | 0.0002 | 0.9934 | 9749 |
| NA | 2 | 0.0002 | 0.9936 | 9751 |
| ABY1_OD1 | 1 | 0.0001 | 0.9937 | 9752 |
| Acetothermia | 1 | 0.0001 | 0.9938 | 9753 |
| Aegiribacteria | 1 | 0.0001 | 0.9939 | 9754 |
| Aenigmarchaeota | 1 | 0.0001 | 0.9940 | 9755 |
| Altiarchaeota | 1 | 0.0001 | 0.9941 | 9756 |
| Atribacteria | 1 | 0.0001 | 0.9942 | 9757 |
| CCM11b | 1 | 0.0001 | 0.9943 | 9758 |
| Candidatus Tectomicrobia | 1 | 0.0001 | 0.9944 | 9759 |
| Chlorophyta | 1 | 0.0001 | 0.9945 | 9760 |
| DTB120 | 1 | 0.0001 | 0.9946 | 9761 |
| Dadabacteria | 1 | 0.0001 | 0.9947 | 9762 |
| Deferrisomatota | 1 | 0.0001 | 0.9948 | 9763 |
| Elusimicrobia | 1 | 0.0001 | 0.9949 | 9764 |
| FBP | 1 | 0.0001 | 0.9950 | 9765 |
| FCPU426 | 1 | 0.0001 | 0.9951 | 9766 |
| FW113 | 1 | 0.0001 | 0.9952 | 9767 |
| Fermentibacterota | 1 | 0.0001 | 0.9953 | 9768 |
| Firestonebacteria | 1 | 0.0001 | 0.9954 | 9769 |
| GAL15 | 1 | 0.0001 | 0.9955 | 9770 |
| GN06 | 1 | 0.0001 | 0.9956 | 9771 |
| GOUTA4 | 1 | 0.0001 | 0.9957 | 9772 |
| Hydrothermae | 1 | 0.0001 | 0.9958 | 9773 |
| Iainarchaeota | 1 | 0.0001 | 0.9959 | 9774 |
| LCP-89 | 1 | 0.0001 | 0.9960 | 9775 |
| Latescibacteria | 1 | 0.0001 | 0.9961 | 9776 |
| MBNT15 | 1 | 0.0001 | 0.9962 | 9777 |
| Margulisbacteria | 1 | 0.0001 | 0.9963 | 9778 |
| Marinimicrobia (SAR406 clade) | 1 | 0.0001 | 0.9964 | 9779 |
| Marinimicrobia_(SAR406_clade) | 1 | 0.0001 | 0.9965 | 9780 |
| Micrarchaeota | 1 | 0.0001 | 0.9966 | 9781 |
| NB1-j | 1 | 0.0001 | 0.9967 | 9782 |
| NKB15 | 1 | 0.0001 | 0.9968 | 9783 |
| NKB19 | 1 | 0.0001 | 0.9969 | 9784 |
| Nanohaloarchaeota | 1 | 0.0001 | 0.9970 | 9785 |
| OD1 | 1 | 0.0001 | 0.9971 | 9786 |
| OP11 | 1 | 0.0001 | 0.9972 | 9787 |
| OP8 | 1 | 0.0001 | 0.9974 | 9788 |
| Omnitrophicaeota | 1 | 0.0001 | 0.9975 | 9789 |
| PAUC34f | 1 | 0.0001 | 0.9976 | 9790 |
| Poribacteria | 1 | 0.0001 | 0.9977 | 9791 |
| RCP2-54 | 1 | 0.0001 | 0.9978 | 9792 |
| Rs-K70 termite group | 1 | 0.0001 | 0.9979 | 9793 |
| SAR324 clade(Marine group B) | 1 | 0.0001 | 0.9980 | 9794 |
| SAR324_clade(Marine_group_B) | 1 | 0.0001 | 0.9981 | 9795 |
| SC4 | 1 | 0.0001 | 0.9982 | 9796 |
| SM2F11 | 1 | 0.0001 | 0.9983 | 9797 |
| SPAM | 1 | 0.0001 | 0.9984 | 9798 |
| SR1 | 1 | 0.0001 | 0.9985 | 9799 |
| Schekmanbacteria | 1 | 0.0001 | 0.9986 | 9800 |
| Sva0485 | 1 | 0.0001 | 0.9987 | 9801 |
| TA06 | 1 | 0.0001 | 0.9988 | 9802 |
| TM6 | 1 | 0.0001 | 0.9989 | 9803 |
| TX1A-33 | 1 | 0.0001 | 0.9990 | 9804 |
| Thaumarchaeota | 1 | 0.0001 | 0.9991 | 9805 |
| Thermodesulfobacteria | 1 | 0.0001 | 0.9992 | 9806 |
| Thermotogae | 1 | 0.0001 | 0.9993 | 9807 |
| WOR-1 | 1 | 0.0001 | 0.9994 | 9808 |
| WPS-2 | 1 | 0.0001 | 0.9995 | 9809 |
| WS1 | 1 | 0.0001 | 0.9996 | 9810 |
| WS2 | 1 | 0.0001 | 0.9997 | 9811 |
| WS4 | 1 | 0.0001 | 0.9998 | 9812 |
| ZB2 | 1 | 0.0001 | 0.9999 | 9813 |
| Zixibacteria | 1 | 0.0001 | 1.0000 | 9814 |
| phylum | n | prop | cumprop | cumn |
|---|---|---|---|---|
| Ascomycota | 1677 | 0.5938 | 0.5938 | 1677 |
| Basidiomycota | 1084 | 0.3839 | 0.9777 | 2761 |
| Mucoromycota | 22 | 0.0078 | 0.9855 | 2783 |
| Mortierellomycota | 19 | 0.0067 | 0.9922 | 2802 |
| Chytridiomycota | 9 | 0.0032 | 0.9954 | 2811 |
| Olpidiomycota | 4 | 0.0014 | 0.9968 | 2815 |
| Glomeromycota | 3 | 0.0011 | 0.9979 | 2818 |
| Aphelidiomycota | 2 | 0.0007 | 0.9986 | 2820 |
| Blastocladiomycota | 1 | 0.0004 | 0.9989 | 2821 |
| Fungi_phy_Incertae_sedis | 1 | 0.0004 | 0.9993 | 2822 |
| Rozellomycota | 1 | 0.0004 | 0.9996 | 2823 |
| Zoopagomycota | 1 | 0.0004 | 1.0000 | 2824 |
The variety of taxa detected is very high, especially for Bacteria and Archaea. There are 124 different bacterial phyla in this version of FoodMicrobionet.
The depth of taxonomic assignment depends on a number of factors (quality and length of the sequences, quality of the reference database, etc.). Here, we will present tables and graphs on this subject, for bacteria only (UNITE results far more often in taxonomic assignments at the species level, even for short sequences).
## Joining with `by = join_by(sampleId)`
## Joining with `by = join_by(taxonId)`
## Warning in left_join(edges_sel_ann, select(taxa, taxonId:species, idelevel)): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 896134 of `x` matches multiple rows in `y`.
## ℹ Row 678 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
## Joining with `by = join_by(studyId)`
## Joining with `by = join_by(studyId)`
## Joining with `by = join_by(studyId, idelevel)`
## Joining with `by = join_by(studyId)`
After some processing to obtain the information from the various tables, here is a box plot showing identifications at the genus level or below, by region, for bacteria only.
## Joining with `by = join_by(studyId)`
## Joining with `by = join_by(studyId)`
Longer sequences for which a good overlap was obtained in paired end
sequences (_TRUE) clearly result in a higher proportion of taxonomic
assignments at the genus level or below. There is some relationship with
the quality of sequences /number of issues encountered during
bioinformatic processing: with more issues, in general, the quality of
taxonomic assignment is worse, but this is not always true.
However, if one keeps into account the number of sequences rather than
just counting the ASVs for which assignment at the genus level or below
level was possible, it is clear that a high proportion of total
sequences received taxonomic assignment at the genus level or below.
It is very likely that this situation may depend on biases in the composition of reference taxonomic databases (in this case SILVA v138.1), in which the number of sequences varies widely in different taxonomic groups.
The ability to obtain a taxonomic assignment down to the genus level varies by phylum. Only the 4 most abundant phyla are shown.
## Joining with `by = join_by(studyId, phylum, idelevel)`
## `summarise()` has grouped output by 'studyId'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(studyId)`
This is even more evident if the data are weighted using the number of sequences and if only the most common target regions are used.